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A statistical mechanical distance constraint model (DCM) is presented that explicitly accounts 
for network rigidity among constraints present within a system. Constraints are characterized 
by local microscopic free energy functions. Topological re-arrangements of thermally fluctuating 
constraints are permitted. The partition function is obtained by combining microscopic free energies 
of individual constraints using network rigidity as an underlying long-range mechanical interaction 
— giving a quantitative explanation for the non-additivity in component entropies exhibited in 
molecular systems. Two exactly solved 2-dimensional toy models representing flexible molecules 
that can undergo conformational change are presented to elucidate concepts, and to outline a DCM 
calculation scheme applicable to many types of physical systems. It is proposed that network rigidity 
plays a central role in balancing the energetic and entropic contributions to the free energy of bio- 
polymers, such as proteins. As a demonstration, the distance constraint model is solved exactly 
for the alpha-helix to coil transition in homogeneous peptides. Temperature and size independent 
model parameters are fitted to Monte Carlo simulation data, which includes peptides of length 10 for 
gas phase, and lengths 10, 15, 20 and 30 in water. The DCM is compared to the Lifson-Roig model. 
It is found that network rigidity provides a mechanism for cooperativity in molecular structures 
including their ability to spontaneously self-organize. In particular, the formation of a characteristic 
topological arrangement of constraints is associated with the most probable microstates changing 
under different thermodynamic conditions. 
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I. INTRODUCTION 



Network rigidity deals with a system of particles sub- 
jected to a set of constraints. Depending on the number 
and position of these constraints, the system will have a 
residual number of independent degrees of freedom. A 
simple way of characterizing the degree of mechanical 
stability of a given framework is to ignore the way con- 
straints are positioned, and to treat all constraints as 
independent. In this approximation, the number of inde- 
pendent degrees of freedom governing internal motions, 
F, in the framework is given by F = dN — Nc — d{d+l)/2, 
where d is the dimension of the system, A^ is the num- 
ber of vertices, Nc the number of constraints, and the 
trivial rigid body motions of the entire framework sub- 
tracted out. The use of constraint counting to deter- 
mine structural stability in macroscopic systems dates 
back to Maxwell ^1]. Nearly 25 years ago. Philips 2] 
realized that constraint counting is applicable to mi- 
crostructure in covalent glasses by treating central and 
bond-bending forces in covalent bonds as nearest and 
next nearest neighbor distance constraints. This simple 
global counting of constraints is commonly referred to as 
Maxwell counting, which may result in positive or neg- 
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ative values for F. A negative F indicates the network 
is over-constrained. Philips Q qualitatively explained 
why covalent glass networks with low average coordina- 
tion form more easily. Shortly afterward, the notion of 
rigidity percolation was introduced by Thorpe 3] , where 
depending on chemical composition a network would mi- 
croscopically be in a floppy or rigid state, having a well 
defined rigidity percolation threshold. Experiments 
have shown many physical properties in covalent glasses 
are related to the rigidity transition. In spite of the 
unique insight that the theory of network rigidity offers, 
it is unfortunate that it still remains a relatively obscure 
subject. An authoritative source on concepts of rigidity 
and its broad range of interdisciplinary applications can 
be found in Ref. [Q]. 

Network rigidity exhibits long-range character that 
makes calculating properties difficult using brute force 
methods on elastic networks . However, the mathe- 
matics of first order graph rigidity [l^, llj referred to 
in the physics literature as generic rigidity greatly sim- 
plifies calculations 0, ^3 ■ Atomic coordinates are not 
required in generic rigidity. Only the connectivity prop- 
erty of the network is important, making it possible to 
calculate many static mechanical properties exactly us- 
ing an integer based combinatoric algorithm. In particu- 
lar, the exact number of internal independent degrees of 
freedom can be calculated, all rigid substructures can be 
identified as well as all correlated motions that couple the 
network of rigid clusters. One such algorithm, referred 
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to as the pebble game, is available for general networks 
in two dimensions Il4l and for bond-bending networks in 
three dimensions Il5l|. A bond-bending network has the 
property that all angles between the central-force con- 
straints that stem outward from an atom are fixed. In 
addition, dihedral angles can be constrained. 

Covalent glasses are ideal systems to model as a 
quenched bond-bending network, where there is a natural 
separation between hard-strong forces (central and bond- 
bending forces) and soft-weak forces (torsional and non- 
bonding forces). The large gap in force strength justifies 
the treatment of covalent glass networks at room temper- 
ature to be modeled as a mechanical network — essen- 
tially a T = calculation. Recently, constraint counting 
has been applied to protein structure PJj where covalent 
bonds, salt-bridges, hydrogen bonds and torsional forces 
on resonant bonds (the peptide bond, for example) were 
modeled as mechanical distance constraints. By treating 
the folded protein structure as a quenched mechanical 
bond-bending network, flexible and rigid regions were 
identified and found to correlate well with biologically 
relevant motions. Network rigidity in proteins has also 
been found to correlate with protein folding pathways 
fvA Hsj l . The success of the T — calculations on pro- 
tein structure suggest that the folded state of the protein 
acts very much like a mechanical machine under the con- 
ditions responsible for the native fold to be thermody- 
namically stable. This result is reassuring, as it has been 
well appreciated that protein function is very precise in 
its response to molecules it encounters having a high de- 
gree of specificity that makes it appear to respond like 
a mechanical machine. This empirical observation moti- 
vated the use of network rigidity calculations at T = 
in the first place. In spite of the success that many me- 
chanical aspects of a protein fold can be quantitatively 
characterized, it is also well known [l^ l20l | that protein 
stability is a result of a delicate balance between many 
weak non-covalent interactions. In particular, enthalpic 
and entropic contributions must be part of the ledger of 
accounts to understand protein stability. 

The study of protein stability has motivated this work 
in generalizing the concept of network rigidity to be ap- 
plicable at finite temperatures in physical systems having 
interactions that do not divide into strong and weak com- 
pared to kT. When viewing a protein as a mechanical 
network, two serious problems immediately become ap- 
parent. Firstly, hydrogen bonds are continually breaking 
and forming consistent with thermal fluctuations, and 
secondly, hydrogen bonds have a wide variety of strength 
that is dependent on their local environment l2ll |2^ . In 
prior work an energetic cut-off criterion p^l23| was intro- 
duced to determine a set of hydrogen bonds to model as a 
constraint. As the energy cut-off was varied, a hierarchi- 
cal analysis of rigid clusters was used to characterize the 
protein structure. Unfortunately, the energy cut-off was 
not directly related to thermodynamic stability, nor the 
entropy from molecular flexibility was considered, which 
limited the range of validity of the (T=0) rigidity model 



to be near the native structure. These problems can be 
resolved by modeling microscopic interactions as distance 
constraints, where each distance constraint represents a 
free energy component within the system. Assigning free 
energy contributions to specific types of interactions is 
commonly done to interpret experimental measurements 
and in theoretical discussions on protein stability p4, 25] . 
However, the utility of such a decomposition is ques- 
tionable because in general it is not possible to obtain 
the total free energy by simply summing the free energy 
components |2^. It will be shown that the free energy 
of a system can be obtained from its free energy compo- 
nents by employing network rigidity calculations at finite 
temperature, which combines mechanical and thermody- 
namic points of views. 

In section m A Distance Constraint Model (DCM) is 
introduced that enables the partition function to be cal- 
culated in terms of an ensemble of mechanical frame- 
works. After the concept of a constraint is generalized 
to contain thermodynamic information, each mechanical 
framework of constraints provides an underlying interac- 
tion that couples enthalpic and entropic terms appear- 
ing in Boltzmann factors. In section IIIII two simple 2- 
dimensional toy models are worked out to illustrate the 
details involved in a calculation. As a final example, an 
exact solution of a distance constraint model for homoge- 
neous peptides that undergo an alpha-helix to coil transi- 
tion is considered in section Hvl In section ^ the results 
from all three models are discussed, and the standard 
Lifson-Roig model for a helix-coil transition is compared 
with the DCM. Conclusions are made in section IVTI 

II. DISTANCE CONSTRAINT MODEL 

Lord Kelvin said, "I never satisfy myself until I can 
make a mechanical model of a thing. If I can make a me- 
chanical model I can understand it!" The distance con- 
straint model (DCM) that will be introduced and care- 
fully discussed in the following subsections closely ad- 
heres to Kelvin's belief. The objective is to use a mechan- 
ical model to understand thermal stability in biopolymers 
(the focus of this paper) as well as other systems such as 
formation of chalcogenide glasses. 

The DCM begins by representing a macromolecule 
and interactions therein as a mechanical bar-joint frame- 
work. For a single static structure, generic network rigid- 
ity properties can be calculated exactly using a graph- 
algorithm that does not depend on geometrical coordi- 
nates of atoms, but only on the topological arrangement 
of distance constraints. Network rigidity is used here as 
an umbrella-phrase to refer to the following mechanical 
properties of a bar-joint framework: 

1. Identification of all rigid clusters, where each dis- 
tinct cluster of atoms forms a rigid body. 

2. Identification of all over-constrained regions, within 
which elastic strain energy resides. 
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3. Identification of all flexible regions, wherein the 
atomic structure can continuously deform. 

4. Identification of all independent constraints and de- 
grees of freedom. 

These basic mechanical properties are quite useful in 
characterizing a single static structure. In this paper, wc 
will generalize the mechanical description (at T=0) by 
employing an ensemble-based approach to account for 
thermodynamics. Thermodynamics determines the fate 
of a biopolymcr, albeit kinetic detours and traps. For 
example, a protein unfolds when an increase in confor- 
mational entropy outweighs a gain in enthalpy from an 
associated loss of many favorable intra-molccular non- 
covalent interactions. Furthermore, a functional protein 
in the native state is stable against thermal fiuctuations 
through enthalpy-entropy compensation. 

The distance constraint model (DCM) uses network 
rigidity as an underlying interaction. Through non-local 
mechanical interactions, network rigidity answers the 
question about which degrees of freedom are independent 
and directly relates to the non-additively of measured 
component free energies. Although the total enthalpy is 
additive, the entropy is not. This non-additive property 
of component entropies derives from not knowing which 
degrees of freedom in the system are independent or re- 
dundant. However, generic network rigidity properties 
can be calculated exactly with the pebble game by recur- 
sively adding one constraint at a time to build a frame- 
work. As constraints are added, some atoms will become 
part of a rigid cluster. A new constraint is redundant 
when added to an already rigid region and independent 
when it removes a degree of freedom. All distance con- 
straints are treated the same in the pebble game, and 
there is a clear distinction between a constraint and de- 
gree of freedom. 

In the DCM, interactions are represented as distance 
constraints, each characterized by an enthalpy and an 
entropy contribution assumed to depend only on local 
structural properties. Constraints are quantified as being 
strong or weak based on their entropy contribution. A 
(greater, lesser) entropy contribution implies a (weaker, 
stronger) constraint. The key aspect of the DCM is that 
stronger constraints must be placed in the network before 
weaker ones in order to generalize network rigidity to 
finite temperatures. This leads to a preferential ordering, 
which is implemented operationally as: 

1. Sort all constraints based on entropy assignments 
in increasing order, thereby ranking them from 
strongest to weakest. 

2. Add constraints recursively one at a time using the 
pebble game according to the rank ordering from 
strongest to weakest, until the entire structure is 
completely rigid. 

The DCM is mathematically well defined and physi- 
cally intuitive. The essential idea is that weak constraints 



allow more conformational freedom than do strong con- 
straints. Stronger constraints take precedence in defining 
rigid structures because weaker constraints are more ac- 
commodating. Thus, a weak constraint acts like a degree 
of freedom relative to a strong constraint. Consequently, 
the notion of a constraint and degree of freedom cannot 
be distinguished clearly once entropy price tags are intro- 
duced. Rather, a quantitative measure for conformational 
entropy is obtained for the structure, whereas the T = 
style of constraint counting simply regards the structure 
as completely rigid. In this way the DCM provides a 
natural mechanism for enthalpy-entropy compensation. 
For example, if by some fluctuation a strong constraint 
breaks (such as a hydrogen bond) there will be a desta- 
bilizing gain in enthalpy, but also a compensating gain 
in conformational entropy as a weaker constraint substi- 
tutes. The technical aspects and mathematical details of 
the DCM are now addressed. 



A. Relating Thermodynamics to Constraint 
Topology 

The distance constraint model (DCM) views a physi- 
cal system at a coarse-grain level as defining a mechan- 
ical bar-joint framework. A framework is constructed 
from distance constraints that are used to represent mi- 
croscopic interactions. Each distance constraint defines 
an equation of the form R = constant, where R is the 
distance between a pair of atoms. A microscopic interac- 
tion involving a group of atoms (more than two) can be 
modeled by more than one distance constraint, where the 
collection of distance constraints between different pairs 
of atoms are simply referred to as a constraint (without 
the word distance as a qualifier). A hydrogen bond is an 
example of a many body interaction that will be modeled 
as a particular type of constraint consisting of three (pair- 
wise) distance constraints. The enthalpy and entropy 
contributions from a specific type of interaction charac- 
terize the corresponding constraint type. Therefore, let 
{AHt,ASt) be the change in (enthalpy, entropy) that 
quantifies constraint type, f, when it is added to a frame- 
work. Over the ensemble of all accessible atomic config- 
urations, the many different geometries between atoms 
will potentially result in a vast number of constraint types 
that must be introduced. However, as demonstrated be- 
low, a remarkably few number of constraint types will 
often be sufficient to quantitatively capture the essential 
physics. 

The microstates of a system arc specified in terms 
of mechanical frameworks, J-', where each framework 
uniquely defines the topology of all distance constraints. 
The DCM is built upon the idea that each frame- 
work, JF, having a specific topology represents a mini- 
ensemble of bar-joint networks of strict distance con- 
straints within the tolerance allowed by the geometri- 
cal coarse graining. One framework consists of many 
possible atomic-coordinate realizations of strict distance 
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constraints. However, because generic rigidity properties 
are sought that do not depend on the geometrical de- 
tails of atomic coordinates, each realization in this mini- 
ensemble has exactly the same network rigidity proper- 
ties. Thus, the framework label, T , represents an en- 
semble of bar-joint frameworks sharing identical network 
rigidity properties that are calculated using strict dis- 
tance constraints. 

The relation to thermodynamics can be made because 
a framework uniquely identifies a mini-ensemble hav- 
ing constant constraint topology, enabling a free energy, 
given as G{J-), to be meaningfully assigned. To this end, 
the total enthalpy of a framework is given by 

AH{T)=J2^HtNt{J') (1) 
t 

where Nt is the number of constraints of type t that are 
present. By exploring all accessible atomic configura- 
tions, an ensemble of frameworks (each representing a 
distinct topology) is generated. The ensemble of frame- 
works partitions phase space into discrete parts, each 
having a constant enthalpy over a limited range of con- 
formational freedom. Therefore, the partition function is 
given by 

Z = J2^{T)e-^^"^^^ (2) 

where ^{J-) is the conformational degeneracy of frame- 
work J-. 

The novel aspect of the DCM is that the conforma- 
tional entropy, given by AS{J-') — k\n^{!F), is ob- 
tained by adding component entropies over independent 
distance constraints that are explicitly identified using 
generic rigidity. Simply adding component entropies over 
all distance constraints will generally lead to a dras- 
tic overestimate for fi(^). However, identification of 
whether a distance constraint is independent or redun- 
dant is not unique. The freedom in choosing which dis- 
tance constraints are independent is akin to the freedom 
in choosing an independent basis set of vectors to span 
a vector space. Consequently, the addition of compo- 
nent entropies over independent distance constraints will 
lead to multiple answers for AS{!F) if based on arbitrary 
selections. Therefore, an auxiliary preferential selection 
criterion for how to choose the optimal set of independent 
distance constraints is required. The crucial part of the 
DCM is that it enforces a preferential selection criterion 
that corresponds to the determination of the minimum 
possible value of AS{J-). 

The total conformational entropy for framework, JF, is 
given by 

A5(J-) = ^A5,4'')(^) (3) 
t 

where /j^-* is the number of independent distance con- 
straints of type t present in the framework as deter- 
mined by the preferential, selection criterion. The 



method for determining linearly independent constraint 
equations involves building a basis set by iteration, where 
a new constraint equation is checked for independence 
against the current basis set. If the new constraint equa- 
tion is independent, then the basis set expands. The pro- 
cedure is continued until all distance constraints in the 
framework are checked. The preferential selection crite- 
rion is defined as: Distance constraints with lower compo- 
nent entropies take precedence in the order that they are 
checked for linear independence. By applying the prefer- 
ential selection criterion in conjunction with exact con- 
straint counting for generic rigidity, the change in Gibbs 
free energy for framework, J-, is given as 

AG{J=-) = AH{T) - T AS{T) > ^ AGt{T) . (4) 

t 

Only in the case that all distance constraints in the 
framework are independent will AG{T) be equal to a 
straightforward sum over the component free energies, 
AGt{T), associated with each constraint type. The par- 
tition function is calculated as 

Z = ^2^'^^'"^^^ (5) 

in accordance to the standard form, except each mi- 
crostate corresponds to a generic mechanical framework, 
T , made up of (infinitely strong) holonomic distance con- 
straints, and the ensemble consist of all topologically dis- 
tinct frameworks. 



B. Generic Rigidity and Non-Additivity of Entropy 

Meaningful thermodynamic properties are directly tied 
to local atomic structure because of coarse graining over 
geometrical bins. To reflect the geometrical aspect of 
the DCM, the index t is represented by two indices (i, 9), 
where i now specifies the type of constraint and q labels a 
specific geometrical bin. For example, a hydrogen bond is 
a particular type of interaction, but its strength depends 
on its local geometry. The component free energy of the 
i-th type of microscopic interaction is expressed as a free 
energy function, AG^, which accounts for all atomic po- 
sitions of the group of atoms under consideration within 
the g-th geometrical bin. The process of obtaining a free 
energy decomposition (2^ (the set of AG^ used in the 
model) is not unique because different types of interac- 
tions will involve one or more overlapping atoms. Also 
there will be unavoidable many body effects, such as elec- 
trostatic interactions between the atoms of interest with 
all other atoms, including those in solvent. The non- 
uniqueness of a free energy decomposition can be used as 
an advantage in the process of defining constituent types 
of constraints. 

An effective strategy in employing the DCM is to de- 
fine a minimum number of constraint types with lim- 
ited number of geometries that will yield a desired level 
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of accuracy in predictions. For each {i,q), the enthalpy 
and entropy contributions denoted as (ATJ^, A^^) can in 
principle be determined self-consistently in lieu of not be- 
ing unique. Self consistency is satisfied when the free en- 
ergy assignment to small clusters of atoms used in defin- 
ing constraint types locally obey the preferential selection 
criterion. This means that various clusters of atoms (for 
example, those within an amino acid or hydrogen bond) 
define subsystems that are treated the same way as the 
full system. Knowing the thermodynamic properties of a 
cluster of atoms allows constraint types to be defined 
and characterized with a Ai/* and AS'*. It is worth 
mentioning that in principle, a hierarchical set of con- 
straint types can be constructed iteratively by defining 
constraints with lowest component entropies first, and in 
succession defining constraints with the next lowest com- 
ponent entropy, etc. 

The procedure to determine the local thermodynamic 
functions {AH^, AS^) for all constraint types and their 
geometries constitutes a preliminary step in the DCM. In 
principle explicit calculations for AG* could be made us- 
ing accurate physical theories (i.e. quantum mechanical 
calculations) involving clusters of atoms within a coher- 
ent potential approximation scheme. This type of bottom 
up approach should be tractable and the results would 
be very useful. However, these difficult calculations can 
be sidestepped (completely or in part) by writing down 
the parametric form of a microscopic free energy func- 
tion with empirically derived parameters. The important 
outcomes are: 1) interactions are modeled as constraints 
characterized by two quantities (AiJ*, AS**) that can be 
determined by theoretical means or fitting to large sets 
of experimental data, and 2) the DCM parameters can 
be expected to be transferable between systems that are 
well described by the same set of constraint types. 

The DCM invokes a probabilistic interpretation that 
all distance constraint realizations between atoms are 
uniformly distributed within a geometrical bin. By al- 
lowing each atom a finite amount of freedom, it is en- 
sured that the framework can be treated as generic. Al- 
though there will be configurations that have atypical 
atomic positions, these will be of zero measure. There- 
fore, the system is modeled as a collection of generically 
placed holonomic constraints, for which many mechan- 
ical properties can be calculated using exact constraint 
counting algorithms. The connection to thermodynam- 
ics enters into the rigidity calculation by determining the 
correct Boltzmann weight assignment to each mechani- 
cal framework, which is related to the non-additivity of 
component entropies. The selected set of independent 
constraints under the preferential selection criterion does 
not depend on coordinates in so far that the same frame- 
work is maintained over a limited range of conformational 
freedom. This limited range of conformational freedom is 
quantified by the total entropy, AS{J-), which depends 
strongly on the topology of distance constraints present 
in the system. 

Calculating the exact value for AS'(J^) will unfortu- 



nately not be possible in the DCM. The preferential se- 
lection criterion is enforced to obtain the best estimate 
for each framework. Fundamentally, overlap in phase 
space can occur when two constraints are independent 
but not orthogonal. The direct result of adding com- 
ponent entropies associated with only independent con- 
straints is that less phase space will be "double counted" . 
Therefore, adding component entropies over independent 
constraints gives an upper bound ior AS{J^). The prefer- 
ential selection criterion ensures the lowest possible upper 
bound because the strongest distance constraints defined 
by the smallest entropies are taken as independent before 
weaker distance constraints having larger entropies. The 
utility of the DCM will depend on the degree of accuracy 
in estimating conformational degeneracy. Note that dis- 
tance constraints not sharing atoms are orthogonal, and 
do not contribute in over-counting phase space. Although 
the distance constraints that share atoms will not gen- 
erally be orthogonal, by construction of a self-consistent 
hierarchical series of constraints, phase space overlap be- 
tween themselves locally is correctly taken into account. 
Therefore an accurate can be expected by using 

a complete set of self consistent constraint types. The 
phrase "complete set" is used to mean that for any posi- 
tion of atomic coordinates, a framework is always defined 
such that after all constraints are placed it is rigid. As 
more constraint types are defined, a framework becomes 
increasingly more over constrained, which can only lead 
to a better lowest upper bound. 



The preferential selection criterion has a simple phys- 
ical interpretation. Each constraint that is added to 
a system can potentially reduce entropy. However, a 
redundant distance constraint does not reduce entropy 
[27| . This is because when a constraint is added to a 
rigid region that is formed by stronger constraints, the 
weaker constraint will accommodate the structural geom- 
etry dictated by the cohort of stronger constraints [28| . 
The strength of a constraint (strong or weak) is tied to 
phase space volume. Therefore, a clear distinction be- 
tween a constraint and degree of freedom is not possible. 
The rigidity calculation at finite temperature treats con- 
straints and degrees of freedom on equal footing in the 
sense that weaker constraints act as degrees of freedom 
relative to stronger constraints. The entropy loss associ- 
ated with an over constrained region is paid at a premium 
by the strongest member constraints. Fortunately, the 
pebble game algorithm |14L ITsI l for determining distance 
constraint independence is based on a recursive proce- 
dure of building a framework one constraint at a time. 
The new implementation only requires using a presorted 
list of distance constraints from strongest to weakest. It 
is worth noting that this algorithm does not model a ki- 
netic process as the constraints in a particular framework 
are present all the time. Rather, the entropy loss from a 
constraint has to do with its effectiveness relative to all 
other constraints in the framework. 



C. Quenched and Fluctuating Constraints 

The term quenched constraint refers to a constraint 
type that wiU be present among a particular group of 
atoms in all frameworks of the ensemble. For example, 
over the temperature range of biological importance, co- 
valent bonding between atoms within a protein is mod- 
eled as a set of quenched constraints. Furthermore, the 
central and bond-bending forces that make up covalent 
bonding are modeled by constraints having microscopic 
free energies associated with a single geometrical bin. 
The torsional force component will also be modeled by 
a quenched constraint (as the torsional force is always 
present) but will have a microscopic free energy, AG*, 
with multiple geometrical bins (labeled by q) depending 
on the dihedral angle. A system modeled by a complete 
set of quenched constraints will generally be associated 
with an ensemble of frameworks because the enthalpic 
and entropic characteristics of distance constraints de- 
pend on local geometry. In the extreme case where only 
one framework is accessible, the DCM will not provide 
optimal accuracy whereas normal mode analysis is more 
appropriate. For example, if a FCC solid is modeled us- 
ing one central force constraint type, then the DCM is 
equivalent to the Einstein model. 

The term fluctuating constraint refers to a constraint 
type that may or may not be present among a partic- 
ular group of atoms having a fixed geometry. When a 
fluctuating constraint is present, it is associated with 
a microscopic free energy, AG^, in the same way as a 
quenched constraint. However, a fluctuating constraint 
is not strictly tied to geometry because it may not be 
present. The DCM allows for fluctuating constraints to 
account for degrees of freedom (dof) that are not explic- 
itly part of a system. For example, solvent dof couple 
to protein atoms defining a system. The solvent-protein 
interactions are modeled as fluctuating constraints on 
the system. In this way, hydration shells around pro- 
tein atoms are modeled as fluctuating constraint types 
characterized by enthalpy, AH^, and entropy, AS*^, con- 
tributions that account for the many body interactions. 
Even more basic is the hydrogen bond. Hydrogen bond- 
ing is modeled as a fluctuating constraint because 1) the 
protein atom electronic dof are not explicitly described, 
and 2) solvent dof compete with intra-molecular hydro- 
gen bonding for a given geometry. Thus, the DCM pro- 
vides a real-space description involving mechanical con- 
straints that directly accounts for fluctuating hydrogen 
bonding, such as found in proteins and water. 



D. Temperature Independent Model Parameters 

The enthalpy and entropy contributions, (A_ff*, AS"*) 
assigned to the various constraint types are functions of 
temperature, pressure, and other thermodynamic con- 
ditions dealing with the chemical environment, such as 
pH, ionic strength or whether the local geometry is in 
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FIG. 1: (a) The torsion interaction within the backbone of 
an amino acid is modeled by two distance constraints shown 
as dashed lines that lock two dihedral angles. Except for pro- 
line, the torsion constraint is characterized by (V,, 25q), where 
q denotes a geometrical bin within a two dimensional {(j>, tp)- 
space. When the geometry is such that both angles fall within 
region q, the energy is Vq and each distance constraint has Sq 
pure entropy. For the a-helix to coil transition, q represents 
either a a-helical or coil geometry, (b) The hydrogen bond 
interaction is modeled by three distance constraints shown 
as dashed lines. The hydrogen bond constraint is character- 
ized by {Uq, 875), where q labels geometrical bins that can be 
defined in different ways. For the Q-helix to coil transition, 
q represents the geometry formed by spanning across three 
consecutive amino acids that can independently be in either 
a-helical or coil geometries. 



a hydrophobic or polar neighborhood. Therefore, cau- 
tion must be exercised in the ordering of the constraints 
from strongest to weakest, because this ordering may 
change as the thermodynamic conditions change. Conse- 
quently, the environmentally induced re-ordering of con- 
straint types by relative strength could potentially cause 
dramatic conformational change. However, the utility of 
the DCM can best be appreciated by using a simplified 
description. 

Model parameters will be taken as constants. Further- 
more, the entropic term will be distributed equally over 
all the distance constraints that model a particular con- 
straint type. Then, all microscopic free energies will have 
the generic form 

AG; = ei-Tk m% (6) 
where is energy, 7* is a dimensionless variable referred 
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to as pure entropy, and m* is the number of distance con- 
straints that are used to model the i-th constraint type. 
Pure entropies are taken to be positive because they are 
fundamentaUy related to the number of accessible quan- 
tum states that are associated with a specified geometri- 
cal bin tolerance, given by e^i > 1. Figure ^ shows two 
example constraint types that will be used in section Hvl 
to model an a-helix to coil transition. A constraint type 
is now generically characterized as (e^, m*7g). These pa- 
rameters can be interpreted as being derived by Taylor 
expanding to first order the true microscopic Gibbs free 
energy about some temperature of most interest. Analo- 
gous to Eq. Q , the total energy of a framework is given 
as 



E{T)= ' 



(7) 



where rf^.{J^) equals (1,0) when the j-th constraint of 
the i-th type is (present, or not present) within the q-th 
geometrical bin. Analogous to Eq. |(2Jl the total pure 
entropy of a framework is given as 



(8) 



where ct* {J-) is the number of independent distance con- 
straints within the j-th constraint of the system, in accor- 
dance with generic rigidity and the preferential selection 
criterion. Note that {0, 1, . . . m*} are the possible values 
that cy^q.{J-) can take. 

The partition function is now written as 



E n 



(9) 

where the form of Eq. @ suggests that the energy and 
entropy contributions are independent. However, not 
only do the values of { cr* } depend on calculations from 



generic rigidity, but when ifq-i^) 



then al.{T) = 0. 



Thus, the energy and entropy of each framework is cou- 
pled through topology via the underlying interaction of 
network rigidity. For example, consider the entropy loss 
associated with the formation of a hydrogen bond. As 
shown in Fig. ^ the hydrogen bond constraint is mod- 
eled as three distance constraints. For a particular ge- 
ometry, the hydrogen bond contributes energy Uq and it 
contributes {0, 7g, 27^, 875} amount of pure entropy to 
the system, depending on if it has {0, 1, 2, 3} indepen- 
dent distance constraints. If 7^ is comparatively small 
indicating a relatively strong distance constraint, then 
the greatest entropy loss for the system occurs when all 
three distance constraints are independent. In contrast, 
if 7q is comparatively large indicating a relatively weak 
distance constraint, then the greatest entropy loss for the 
system occurs when all three distance constraints are re- 
dundant. Since the results depend on the topological 



a) 






FIG. 2: A small two dimensional ring molecule in the shape 
of a quadrilateral. The shaded regions schematically show the 
allowed geomentrical variation for fixed topology indictive of 
the degree of flexibility. Conflguration (a) is topologically 
distinct from (b) and (c). For identical atoms at each corner, 
configurations (b) and (c) represent the same topology of con- 
straints, but are distinct otherwise. The framework in (a) is 
refered to as state H; where it has greater energy and confor- 
mational entropy than (b/c). The (b and/or c) framework is 
refered to as state L; where the bond along the diagonal leads 
to a lower energy and conformational entropy than (a). 



arrangement of all constraints in the system, no a pri- 
ori statement can he made about whether the formation 
of a hydrogen bond will supply a favorable or unfavorable 
entropic contribution. 



III. TOY MODELS IN TWO DIMENSIONS 

The (internal) partition function for the two dimen- 
sional molecule shown in Fig. 12 is calculated to illustrate 
basic concepts. The molecule consists of four identical 
atoms connected together by four strong central force 
bonds forming a quadrilateral. The central force (cf) 
bonds are modeled as quenched constraints character- 
ized by energy, Ucf and pure entropy 7c/ . Four torsional 
forces are also modeled as quenched constraints. In 2D 
the torsion force (tf ) is a function of the angle between a 
pair of central force bonds. It is modeled as a next near- 
est neighbor distance constraint characterized by energy 
Vt f and pure entropy 5t / . The torsional free energy sur- 
face is assumed shallow over a large range of angles. A 
hydrogen bond (hb) in 2D is considered a single central 
force, and is modeled as a fluctuating constraint charac- 
terized by energy Uhb and pure entropy ^hb- Within a 
length tolerance, a hydrogen bond can form between a 
pair of atoms along either diagonal of the quadrilateral. 

As Fig. El shows, there are only two distinct types 
of frameworks, labeled as {L and H) when the hydrogen 
bond (is, is not) present. This is a two level system (three 
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states are required for distinguishable atoms). Employ- 
ing the DCM, the first step is to rank order the distance 
constraints from strongest to weakest. This ranking is 
based on sorting the pure entropies from lowest to high- 
est, assumed given as: 

pure entropy: 7^/ < ^ihh < ^tf 

rank: 1 2 3 ^ ' 

The second step requires calculating the total energy and 
pure entropy for each framework using the preferential 
selection criterion. In state H there are 8 distance con- 
straints (4 cf and 4 tf ) and in state L there are 9 distance 
constraints (4 cf, 4 tf and 1 hb). There are 8 dof, three 
of which involve global translations and rotations. Five 
distance constraints will always be independent making 
the framework rigid. From Eqs. iQandlSl) it follows that: 

State H: th = ^icf + Sff Eh = Wcf + Wt / 
State L: TL = 47^/ + 7^6 El = Wcf + Wtf + Uhb 

(11) 

Therefore, the (internal) partition function is given as 

Z = e^^e-'^^^ + e^"e-P^" . (12) 

With Uhb < as expected for chemical bonding, the 
states (L, H) will be more probable at (low, high) tem- 
peratures respectively. Since for both states, the energy 
and pure entropy terms associated with the cf-constraints 
and the energy terms for the tf-constraints are the same, 
the partition function simplifies to 

Z = Zo {e^^^e^'^^^" -I- e^^!\ (13) 

where Zq contains the terms common in both L and 
H states. This example illustrates a general result 
that the strongest quenched constraints play a passive 
role. Molecular cooperativity is controlled by competi- 
tion among weaker interactions. It is worth mention- 
ing that if the two level approximation does not pro- 
duce a suHiciently accurate temperature response, then 
the model parameters could be regarded as temperature 
dependent functions. Alternatively, the single geometri- 
cal bin for the assumed weakly varying (as a function of 
temperature) torsional free energy can be further subdi- 
vided to better account for thermodynamic response by 
creating more terms in the partition function. 

The (internal) partition function for a more interest- 
ing two dimensional molecule shown in Fig. |21 is calcu- 
lated. This molecule consists of five backbone and five 
side-chain atoms connected by central forces. A side- 
chain atom at the end of the chain can swing around the 
backbone atom, but it is assumed that a potential bar- 
rier must be crossed. The highest point of the energy 
barrier is when the side-chain atom is co-linear with the 
backbone chain. Therefore, the molecule is regarded to 
have four topologically distinct conformations, each hav- 
ing the same characteristic energy basin. Finally, side- 
chain atoms that are in sufficient proximity of one an- 
other can hydrogen bond. 




FIG. 3: A small two dimensional chain molecule. Back- 
bone atoms are denoted by filled circles. There are only 
four topologically distinct conformations (a-d) accessible to 
the molecule as it deforms during the process of hydrogen 
bonds breaking and reforming. Dashed lines represent hy- 
drogen bonding. Left side: All conformations have a large 
conformational degeneracy. Right side: When all hydrogen 
bonds are present the molecule has much less conformational 
degeneracy. In particular, for conformation (d) a Rigid state 
is defined when all four side-chain atoms form a rigid cluster 
from hydrogen bonding. 



The central force constraint is characterized by (J7c/j 
7c/), and the hydrogen bond constraint is characterized 
by ([/, 7). There are two types of torsion force con- 
straints involving angles between BBB atoms or BBS 
atoms, where B and S represent backbone and side-chain 
atoms respectively. The torsional constraint type for the 
BBB angle is characterized by {Vbbb, Sbbb) and the 
torsional constraint type for the BBS angle is character- 
ized by (V, S). The distance constraints are now ranked 
from strongest to weakest, assumed given as 

pure entropy: 7^/ < 7 < (5 < Sbbb /-|.^ 
rank: 1 2 3 4 ^ ' 

Since both torsion constraint types are quenched con- 
straints, it follows that the pure entropy parameter for 
the BBB type of angle is always irrelevant for all frame- 
works in the ensemble. This example illustrates an im- 
portant point that weak forces often need not be associ- 
ated with an entropy term, because they will always be 
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n 





1 


2 


3 


4 


5 


6 


gin) 


4 


18 


33 


32 


18 


6 


1 



TABLE I: Hydrogen bond configurational degeneracy. 

redundant. Nevertheless, many weak forces can still play 
an important role in the energetics. 

There are a total of 112 possible frameworks, corre- 
sponding to 2* different frameworks (due to fluctuating 
hydrogen bonds) for each of the topologically distinct 
conformations shown in Fig. Oibc and 2^ frameworks 
for the conformation shown in Fig. Once all the 

central force constraints are placed (first) there are 8 in- 
ternal dof remaining in the molecule. If no hydrogen 
bond constraints are placed, then the total pure entropy 
of the molecule will be 97c/ -I- 86, which gives the max- 
imum possible value. As hydrogen bond constraints are 
added, the total pure entropy will decrease. The best 
chance of finding a redundant hydrogen bond is when 
the maximum number is present for each distinct topol- 
ogy. By inspection, only one framework out of 112 has 
a redundant hydrogen bond constraint, corresponding to 
the six hydrogen bonds all simultaneously present in the 
conformation shown in Fig. Recall that the param- 
eters associated with the quenched constraints common 
to all frameworks can be factored out. Therefore, rela- 
tive to the conformations containing no hydrogen bonds, 
the change in Gibbs free energy, AG(n), for the molecule 
having n hydrogen bonds is given by 

J AG(n) = nU - kT[n 7 + (8 - n) (5] for 71 = 0, 1, ... 5; 
\ AG(6) = 6U- kT[5 7 + 3 (5] 

The factor of (8 — n) appears because each independent 
hydrogen bond constraint eliminates an angular dof. The 
remaining (weakest) torsion force constraints rigidities 
the molecule. 

In this example many of the frameworks have degen- 
erate Gibbs free energy. The Gibbs free energy already 
accounts for conformational degeneracy, but there is also 
a configurational degeneracy in the number of hydrogen 
bond combinations that are possible. Therefore, the par- 
tition function is written as 

Z = Y^g{n)e-^^^(-^ (15) 

n 

where g{n) is the number of frameworks with n hydrogen 
bonds. The values of g{n) for different n are tabulated in 
Table ^ which is obtained by straightforward counting. 

The heat capacity is plotted in Fig0^, showing a peak 
near 310 Kelvin, where the model parameters were fixed 
to convenient values to show interesting features. This 
peak is a manifestation of a structural transition from the 
Rigid state (defined in Fig. (SJl) at low temperature to a 
Flexible state at high temperature. The degree of rigidity 
is also shown by plotting the equilibrium probability, Pji, 
for the molecule to be described by a framework with 5 
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FIG. 4: (a) Heat capacity as a function of temperature, 
(b) Probability for the molecule to form a rigid structural 
unit. The selected parameters were obtained by choosing the 
marked point on the phase diagram in Fig. |3 and fixing the 
transition temperature to be near 310 Kelvin. 



or 6 hydrogen bonds, where 

P^(T) = (e-/3^G(6) _^ ^(5) g-/3AG(5))/^ (^g) 

represents only the frameworks that form a rigid struc- 
tural unit. The probability for being in the rigid state is 
used as an order parameter. A phase diagram is shown in 
Fig. |S1 where the solid line corresponds to the maximum 
heat capacity used to locate the transition temperature. 
The shaded area defines a broad transition region defined 
as (0.1 < Pn < 0.9) indicating no substantial preference 
for either the rigid or flexible states. 

IV. ALPHA HELIX TO COIL TRANSITION 

The DCM is employed to describe a transition from a 
stable a-helix structure that is rigid at low temperature 
to a flexible coil involving many disordered conformations 
at high temperature. The backbone of a homogeneous 
peptide chain, as depicted in Fig. |^, is considered for 
simplicity. Compared to the Zimm-Bragg l29!| or Lifson- 
Roig [33 models, the DCM is mathematically more com- 
plicated because network rigidity is a long-range interac- 
tion that will be explicitly quantified in terms of a direct 
product between a rigidity state space and a conforma- 
tional state space, from which a transfer matrix is con- 
structed. 

Four constraint types are used here to model central, 
bond-bending and torsional forces involved in covalent 
bonds as well as hydrogen bonds. The strongest two 
constraint types, modeling the central and bond-bending 
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U/kJ 



FIG. 5: The phase diagram of the two dimensional chain 
molecule. The difference in pure entropy between the hydro- 
gen bond and torsional force constraints, and the hydrogen 
bond energy scaled by thermal energy are the only two rele- 
vant variables. 



forces, are placed in the network before the weaker con- 
straint types. Thus, a chain of n amino acids has 2n dof 
along the backbone because only the (j) and ip dihedral 
angles in each amino acid (proline is is not considered 
here) are free to rotate. The energy and pure entropy 
parameters for the central and bond-bending constraint 
types are not of concern because they play a passive role 
in the partition function, as explained in section llTTl The 
remaining two constraint types depend on the local con- 
formation of the backbone as determined by the (j) and 
^p dihedral angles. Explicit side-chain to side-chain and 
side-chain to backbone interactions are not considered in 
the analysis given here. 

The third constraint type describes a torsion interac- 
tion. Torsion constraints along the backbone are parti- 
tioned into distinct geometrical bins depending on the 
(j) and '0 angles. For example, different bins can be de- 
fined using a Ramachandran plot |^ Is^l for each type 
of amino acid. Here, the a-helical and coil geometries, 
labeled a and c respectively, are considered to be the 
only two accessible conformational states. The coil ge- 
ometry, c, includes all other secondary structures (non 
a-helical) such as a /3-strand, 3-10 helix or left-handed 
a-helix. The (energy, pure entropy) of the a-helical and 
coil torsion constraints are given by {Va, 2Sa) and {Vc, 
2Sc) respectively. As shown in Fig. the torsion con- 
straint contains two distance constraints to lock the (j) and 
-0 angles. Each distance constraint carries a pure entropy 
of Sa or Sc in the a-helix or coil geometry respectively. 

The fourth constraint type describes hydrogen bond- 
ing. For simplicity, only backbone hydrogen bonds be- 
tween the carbonyl oxygen of the z-th amino acid and the 




FIG. 6: (a) The backbone of a peptide chain. The dihedral 
angle of the peptide bond (C-N) cannot rotate. The long 
curved dashed line represents a possible hydrogen bond, (b) 
An example of poly-alanine. The dihedral angle between C^- 
C/3 can rotate. 



amine nitrogen of the (z -I- 4)-th amino acid are consid- 
ered accessible. The (energy, pure entropy) for a hydro- 
gen bond constraint are given by (Uxyz, '^Ixyz) where a;, 
y and z specify the local (a or c) backbone geometries of 
the i + 1, i + 2 and i + S amino acids that are spanned. 
As shown in Fig. ^p, a hydrogen bond constraint con- 
tains three distance constraints, where each distance con- 
straint carries a pure entropy of jxyz ■ Noting that there 
are eight possible geometries, each requiring the two pa- 
rameters Uxyz and jxyz, gives a tally of 16 parameters 
for the hydrogen bond constraint type. 

The peptide chain is decomposed into triplets, denoted 
by where x, y and z represent a or c geometries 

for the {i, z -I- 1, i + 2} amino acids. To account for hy- 
drogen bond fluctuations, a triplet may or may not have 
a spanning hydrogen bond. Another variable, Xi = (1,0) 
is used to specify whether a hydrogen bond constraint is 
(present or not) across the i-th triplet. When present, a 
hydrogen bond spans the i-th triplet by connecting the 
i—1 amino acid to the i+i amino acid. The greatest num- 
ber of hydrogen bonds that can form within an a-helix 
of n amino acids is n — 4, since the only triplets that can 
have a spanning hydrogen bond are (i = 2, 3, . . . , n— 3). 
Note that the variable \i corresponds to the i-ih amino 
acid in the chain, and therefore it is associated with the 
leading edge of a triplet. A triplet (not at the ends) will 
have 16 possible conformational states corresponding to 
8 different local-geometries with or without a hydrogen 
bond. The complete specification of the conformation of 
a triplet has the general form A[a;?/z]. An energy Uo is in- 
troduced for triplets of the form 0[xj/2:], which represents 
the hydrogen bond energy resulting between the peptide 
backbone and solvent. Therefore, Uq is an additional hy- 
drogen bond parameter (17 total) in the DCM considered 
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here. 



A. Rigidity Propagation Rule 

To facilitate exact constraint counting subjected to the 
preferential selection criterion, the degree of rigidity for 
a triplet is specified by a local rigidity state, denoted 
as |lrs). The local rigidity state contains the minimum 
amount of information about rigidity at the end of a chain 
such that when the next amino acid is added, the local 
rigidity state of the end triplet is uniquely specified. The 
set of all accessible local rigidity states, {|lrs)}, will serve 
as a basis set for a rigidity state space. A complete basis 
set will be generated using the rigidity propagation rule. 

Each triplet has 6 dof, 6 torsional force distance con- 
straints and when there is a spanning hydrogen bond 
3 additional hydrogen bond distance constraints. The 
pure entropies of each type of distance constraint is rank- 
ordered from 1 to 10 because there are 8 different j^yz 
and 2 different Sx assuming no degeneracies. A tor- 
sional force distance constraint (tfdc) and a hydrogen 
bond distance constraint (hbdc) lock dihedral angles dif- 
ferently. A tfdc is confined to lock a specific dihedral 
angle, whereas a hbdc spans all 6 dof within a triplet. A 
hbdc can be used to lock any of these 6 dof, and should 
lock the one which will minimize the total conformational 
entropy of the chain. In this sense, hydrogen bond dis- 
tance constraints are promiscuous. Consequently, the dof 
that is best to lock cannot be determined solely on the 
local triplet conformation because network rigidity is a 
long-range interaction. Therefore, an algorithm for prop- 
agating the local rigidity state must be established. 

A local rigidity state specifies the current rank assign- 
ment of constraints used to lock the first 4 dof in a triplet. 
The rank assignment corresponds only to independent 
constraints. The local rigidity state is represented as 

|lrs) = \ri,r2,r3,r4) (17) 

where is the rank of the distance constraint that locks 
the fc-th dihedral angle in a triplet. The ranks of the last 
two dihedral angles within a triplet will become impor- 
tant in determining the local rigidity state of the next 
triplet upon propagation. The explicit form for |lrs) in 
Eq. (|17|l provides a book-keeping device to calculate the 
preferential sum of pure entropies over independent con- 
straints. The algorithm for propagating rigidity from left 
to right takes the form: 

1. Given r2, rs, r4): Retain the 4 temporary rank 
assignments and augment the 2 ranks from the 
torsional constraint on the third amino acid, thus 
forming a temporary template involving 6 ranks, 
given by: {ri, r2, ra, r4, rs, rg} 

2. If no hydrogen bond is present continue to the next 
step. Otherwise, perform the following operations 
when a hydrogen bond spans the new triplet. At- 
tempt to place one distance constraint at a time. 



each having a rank of rht- Find the maximum rank, 
denoted as r^^\ out of the six current ranks in the 
template. The superscript indicates that this is 
the maximum rank, and the index i specifies its lo- 
cation within the template. If r^b > r^^\ continue 
to the next step because this and any of the remain- 
ing hydrogen bond distance constraints are redun- 
dant. Otherwise, replace the maximum rank by r^b- 
Working from right to left (the direction opposite 
to propagation) find the next maximum rank, de- 

(2) (2) 

noted as . If r^- > rhb then swap ranks. That 

(2) 

is, let ri = rj and rj = rhb- Continue the process 
of swapping rank r^b with the next greatest rank to 
its left, until it can no longer be shifted to the left. 
Continue to the next step when all three hydrogen 
bond distance constraints have been place. 

3. The first two degrees of freedom in the triplet are 
permanently locked by distance constraints that are 
associated with the ranks ri and r2 in the template. 
The remaining four ranks in the template define the 
current local rigidity state of the new triplet given 
as: \r[ = r^, r'^ = r4, rJ, = r^, r'^ = re). Repeat 
this process (back to step 1) until the propagation 
through all triplets is finished. 

Step 2 can be understood conceptually. Ranks within 
a template act as a dof relative to a hbdc rank whenever 
they are greater than r^b, otherwise they act as a con- 
straint. Among the ranks acting as a dof, a lower rank 
acts as a constraint relative to a greater rank. Therefore, 
the greatest rank should be replaced by r^b- However, 
it could happen in a future test (as the chain is propa- 
gated from left to right) that the largest rank within the 
current template could be replaced by a different hbdc 
that spans a different triplet downstream. If this hap- 
pens, it would be better to use the current hbdc to lock 
the second highest rank. Replacing the highest rank, or 
replacing the second highest rank, etc, depends on the 
relative rank of a future hbdc, if any appear at all! This 
makes the transfer matrix approach different than the 
usual case, because rigidity is non-local where the con- 
formations down the chain will affect the optimal rank 
substitution at the current triplet. 

The first hbdc encountered down the line that over- 
laps with part of the current triplet will be effective as 
a constraint within the current triplet only if its rank is 
lower than the greatest rank, r^^^ found in Eq. I|17|l . 
The second effective hbdc must have a rank lower than 
the second greatest rank, r'^^\ If no effective hbdc is en- 
countered, it is best to replace r^^^ with r^b in step 2 
of the algorithm. If one effective hbdc is encountered, it 
is best to replace r^^^ with r^b- More generally, if n ef- 
fective hbdc are encountered, it is best to replace r("+^) 
with rhb if possible. All these cases are properly handled 
by building into the definition of a local rigidity state 
a chain reaction that automatically swaps higher ranks 
into lower ranks when needed. The chain reaction is ini- 
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tialized in step 2 by the process of swapping ranks within 
a triplet from highest to lowest working in the opposite 
direction of propagation. The outcome of the above algo- 
rithm, is that both the long-range interaction of rigidity 
and the global preferential selection criterion are properly 
described. 

Figure [71 shows how the rigidity propagation rule is 
implemented on a short chain in a particular framework. 
The initial description of the chain includes the ranks 
of all torsion and hydrogen bond constraints that are 
present. This framework contains 18 redundant con- 
straints since the chain in any conformation is always just 
rigid (isostatic) whenever there are no hydrogen bonds 
along the backbone, and here there are 3 x (6 hydrogen 
bonds) extra distance constraints. The final description 
shows the ranks of only independent distance constraints 
that remain after being permanently assigned in step 3 
of the propagation rule. The final ordering of ranks gen- 
erally depends on the direction of propagation, but the 
final distribution of ranks (i.e. number of independent 
constraints having rank 1,2,.. .) is invariant. Moreover, 
the final rank-distribution is identical to that of a prefer- 
ential selected set of independent constraints obtained by 
placing the strongest distance constraints before weaker 
ones in otherwise arbitrary order. 

Referring to Fig. [T] the entire process of propagating 
from left to right is shown. The 1st triplet has a local 
rigidity state given by |5, 5,3, 3). This 1st triplet does 
not have a spanning hydrogen bond, therefore, the next 
triplet (after the first propagation) has a local rigidity 
state given by |3,3, 5,5). During the first propagation 
step, each tfdc within the 1st amino acid is recorded 
as independent, locking the (f>i and tpi dihedral angles. 
The pure entropy associated with these two distance con- 
straints are recorded in terms of the two ranks, {5,5}. 
For the second propagation step, the spanning hydrogen 
bond across the 2nd triplet changes the temporary rank 
assignments as follows: 

initial |lrs): |3, 3, 5, 5) 

tfdc template: {3, 3, 5, 5, 3, 3} 

plus 1st hbdc: 2, 

intermediate 1: {2, 3, 3, 5, 3, 3} 

plus 2nd hbdc: 2, (18) 

intermediate 2: {2, 2, 3, 3, 3, 3} 
plus 3rd hbdc: 2 

intermediate 3: {2, 2, 2, 3, 3, 3} 

final I Irs): |2, 3, 3, 3) 



The (1)2 and 'ip2 are considered to be locked by two of 
the promiscuous hydrogen bond distance constraints, and 
recorded by the two ranks {2,2}. 

The rigidity propagation rule applied to a specified 
framework, T, allows the total pure entropy, t{T), to 
be calculated as the sum over pure entropies associated 
with the ranks of the distance constraints used to per- 
manently lock the (j) and dof. For a given framework. 



the alternative calculation for t{!F) is to use the pebble 
game algorithm |l4i Il5l|. where the distance constraints 
with lowest ranks are placed in the network first. The 
propagation algorithm was explicitly tested against 
exact calculations using the pebble game. Although pref- 
erential constraint counting offers an exact calculation 
method, by incorporating the rigidity propagation rule 
into a transfer matrix, t{J-) no longer requires explicit 
calculation on each framework in the ensemble. 



B. Transfer Matrix and the Partition Function 

The transfer matrix is constructed from a direct prod- 
uct space formed by a triplet conformational state de- 
noted by \X,x,y,z), where A is one when a hydrogen 
bond spans the x,y,z triplet, zero otherwise and x,y,z 
are either alpha helix (a) or coil (c). A triplet is com- 
pletely specified as 

triplet state = |A, a;, y, z) (g) |ri, r2, r3, r4) , (19) 

where ri and r2 are the ranks of the constraints on the 
phi and psi angle (backbone angles) of the x state, and 
and r4 are the corresponding ranks of the constraints on 
the y state. The 4 ranks on the first two amino acids, the 
presence or absence of a spanning hydrogen bond, and 
the conformational state (helix or coil) of each residue 
together completely specify a state. 

Most elements of the transfer matrix, T, will be zero. 
The non-zero matrix elements have the form given by: 

{X',x' ^y,y' ^ z,z'\ ® {r[,r'2,r'^,r'^\T\X,x,y,z) 

^\rur2,r3,u) ^e^^-e-^'^'- (20) 

where after a propagation to the right the new first amino 
acid corresponds to the prior middle amino acid and the 
new middle amino acid corresponds to the prior right 
amino acid. In addition to this, the matrix element will 
only be non-zero if the set of final ranks in the local 
rigidity state obey the rigidity propagation rules. The 
non-zero matrix element then contributes a Boltzmann 
factor that accounts for both the energy and pure en- 
tropy contributions of the constraints encountered. The 
variables Arp and Ae^ respectively represent the change 
in pure entropy and energy upon propagation along the 
chain. The contribution to Atj, at each propagation step 
is given by the sum of pure entropies of the two con- 
straints that permanently lock the two dof within the 
first amino acid of a triplet. Thus Ar^ is determined 
by the rigidity state space in accordance to step 3 of 
the rigidity propagation rule. In contrast, Acp is deter- 
mined by the conformational state space where it is a 
function of only X[xyz] and it is found by summing the 
hydrogen bond energy given by Uxyz when A = 1 and Uo 
when A = 0, with the torsional force constraint energy 
given by Vx- By construction, the zeros and non-zeros of 
the transfer matrix accounts for the rigidity propagation 
rules, thereby correctly propagating rigidity. 
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FIG. 7: The top schematic describes the backbone of a 13-mer peptide chain in a conformation that has: Torsional constraints 
with pure entropy ranked either 3 or 5, and occasional hydrogen bond constraints (pictorially represented as bars spanning 
three consecutive pairs of dihedral angle dof) with pure entropy ranked either 1, 2 or 4. Each step in the propagation of the 
local rigidity state from left to right is shown along the diagonal. The final ranks that remain after propagating from left to 
right (L — » R) is given on the 3rd to last row. The final ranks obtained by propagating from right to left (R — » L) is given on 
the 2nd to last row. The last row labels the amino acids. Both propagation directions yield 3 rank-1, 12 rank-2, 5 rank-3, 2 
rank-4 and 4 rank-5 independent distance constraints. 



Ignoring boundary conditions momentarily, the (inter- 
nal) partition function could be calculated as: 



(21) 



The method for constructing the transfer matrix, T, is 
explained by working through an example. Consider a 
chain of 13 amino acids where the framework given as 



0101110011000 

cacaaacccaaccss 



(22) 



is one realization taken from an ensemble of 2*^^^+^' 
frameworks describing all accessible chain conformations 
(of a chain of length 13). The numbers of 1 or on top of 
an a or c specify A in a triplet, A[a;j/2:]. A number placed 
over an amino acid describes a hydrogen bond that spans 
it and the next two amino acids to the right. In order for 
a chain of length n to be represented by n-triplets, two 



s solvent states are explicitly shown as being augmented 
at the right end of the chain. Effects of this state arc dis- 
cussed below under boundary conditions. The first and 
last three zeros (in bold) correspond to triplets for which 
an intra-molecular hydrogen bond cannot form. 

The dimension and form of the transfer matrix, T, 
strongly depends on the rank ordering of pure entropies. 
For purpose of illustration, consider the rank ordering: 



pure entropy: < 
rank: 




< 



Icac 
'lace 

4 
(23) 

where rank is associated with the special s- 
conformation, and rank 6 is associated with a hydrogen 
bond that spans a local [ccc] geometry. In this case, ^ccc 
plays no role because it will always be redundant. In this 



< 



5c 
5 
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example, intra-molecular hydrogen bonds that span the 
same nmnber of coil states within a triplet are degener- 
ate. Thus, 7caa = laca = laac and Ucaa = Uaca = Uaac, 

etc. 

The initial product vector that needs to be propagated 
is given as |0, c, a, c)|5, 5, 3, 3), where the symbol ® will 
be dropped from now on. This vector is obtained below 
by considering the process of propagating triplet 0[ssc] 
to 0[sca] before arriving to the current triplet 0[cac]. Us- 
ing the rigidity propagation rule, the 1st matrix mul- 
tiplication by T propagates the initial vector into vec- 
tor |1, a, c, a) |3, 3, 5, 5), while the 2nd matrix multiplica- 
tion gives |0, c, a, a) |2, 3, 3, 3). The shifts in the confor- 
mational states are obvious, and the propagation of the 
local rigidity states are calculated according to Example 
IT51 In fact, the initial configuration of ranks shown in 
Fig. [7| precisely correspond to the framework given in 
Ex. 1221 In the 1st propagation step, the contribution 
of pure entropies from constraints that lock the (/>! and 
■01 dihedral angles is given as Ari = 25 c- The energy 
contribution is Aei — Vc + Uo, which reflects the hydro- 
gen bond energy between peptide and solvent. At each 
propagation step another product vector will be gener- 
ated. The second step takes the vector |1, a, c, a)|3, 3, 5, 5) 
into vector |0, c, a, a)|2, 3, 3, 3). The energy contribution 



is Aea = K+ Ua 



which reflects the intra-molecular 



hydrogen bond energy that depends on local geometry 
[aca]. The pure entropy contribution is given by At2 = 
2"faca, resulting from two rank 2 pure entropy values. All 
matrix elements are determined by energy contributions 
from consecutive triplet conformation states described in 
Ex. 1221 and pure entropy contributions are determined 
by the final rank ordering (from left to right) listed in 
Fig. Ul Some matrix elements generated by the frame- 
work given in Ex. [221 are listed in Table ITll 



1. Boundary Conditions 

In addition to constructing the transfer matrix, T, the 
boundary conditions on both the left and right ends of 
the chain must be specified. The boundary conditions 
are of particular importance for peptides that are exper- 
imentally studied because most often they are less than 
20 amino acids long. The approach taken here is to add 
auxiliary triplet states before and after the chain to take 
into account solvation effects. A requirement that the 
left and right boundary conditions must satisfy is: left to 
right propagation and right to left must yield identical 
results for all observable quantities. This basic require- 
ment is satisfied by the approach used here. 

An infinite number of auxiliary s-conformations are ap- 
pended to the beginning and end of the chain to represent 
bulk solvent. A triplet of auxiliary s-conformations is of 
the form 0[sss], and it is used as a reference state. The 
transfer matrix propagates the triplet 0[sss] into another 
0[sss] triplet with a Boltzmann weight of 1 by defini- 
tion. The auxiliary s-conformations play a passive role 



step 


transfer matrix element 


Boltzmann factor 


1 


(l,a 


c 


a|(3,3,5,5|T|0,c, 


a 


c)|5,5,3,3> 




2 


(0,c, 


a 


a (2, 3,3, S T 1, a 


c 


a)|3,3,5,5) 


g27acag-/3(Va + C/„ca) 


3 


(l,a 


a 


a|(3, 3,3,3|r|0,c 


a 


a>|2,3,3,3> 


g7coo+<5ag-/3(V,+C7o) 


4 


(l,a 


a 


c|{l,3,3,3|r|l,a 


a 


a>|3,3,3,3> 


g27aaag-/5(Va + C/aaa) 


10 


{0,a 


c 


c|(2,2,3,3|r|l,a 


a 


c)|2,3,3,3> 


g27caag-/3(Va + i7„„,) 


11 


(0,c, 


c, 


s|(3,3,5,5|S'|0,a, 


c, 


c) 12,2,3,3} 


g27aaCg-/3(V„ + (7„) 


12 


(0,c, 


s, 


s|(5,5,0,0|i?|0,c. 


c, 


s>|3,3,5,5) 


g25„g-/3(Ve+C7o) 


13 


(0,s, 


s, 


sKO,0,0,0|Q|0,c, 


s, 


s)|5,5,0,0) 


g2«,g-/3(Vc + (7o) 



TABLE II: A short list of selected matrix elements that are 
generated from the framework given in Example 1221 Refer to 
Fig. |7|to check the correspondence between the pure entropy 
contribution Arp on the p-th propagation step with the final 
ranks listed for left to right propagation. 



in the calculation (as if they are not present) except in 
triplets at the ends of the chain where they mix with a- 
or c-conformations within the chain. Physical boundary 
conditions require the local rigidity state of the last 0[sss] 
solvent triplet just before the chain to be equal to the lo- 
cal rigidity state of the first 0[sss] solvent triplet at the 
end of the chain. Furthermore, this local rigidity state 
must be the same for any peptide, regardless of its length 
or composition. Therefore, the local rigidity state for 
the 0[sss] solvent triplet is defined as jr^, r^, r^, r^) where 
Ts = to represent the lowest rank associated with a 
minimum pure entropy, 7^ = 0, which is the lowest phys- 
ically realizable value. Consequently, when propagating 
from one solvent triplet to the next Axp = 0, and by set- 
ting Aep = 0, then the Boltzmann weight of 1 is ensured. 
With these boundary conditions no bulk properties of 
solvent (the reservoir) are calculated, while peptide to 
solvent interactions are taken into account by fluctuat- 
ing constraints acting on the peptide (the system). 

Consider propagating from left to right. Then the left 
boundary condition is most conveniently represented as 
a column vector in the direct product space, denoted as 
\i). The form of the initial vector is given by 



E 

x,y,z 



-/3(A£0ssx+A£0sxy) 



\0,x,y,z)'S)\r^,r^,ry,ry) 



(24) 

The ranks r^, and Vy are respectively associated with the 
pure entropy of a tfdc in conformation state (x of the 1st 
amino acid) and (y of the 2nd amino acid). No entropic 
contributions arise in propagating from the 0[sss] triplet 
to the 0[a::y2:] triplet because of the rigidity propagation 
rule when no hydrogen bonds are present and the defini- 
tion of the special s-conformation. However, AeQ^sx and 
AeQsxy account for solvation energy between the peptide 
and solvent. Here a triplet with no spanning hydrogen 
bond is taken to contribute Uo energy. Therefore, the 
initial state vector simplifies to 



E 

x,y,z 



|0,a;,y. 



' X-} ' x-> ' y ■) ' y / 



(25) 
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The right-end boundary condition is implemented us- 
ing three special transfer matrices that involve the s- 
conformation. Starting from the X[xyz]n-3 triplet, trans- 
fer matrices S, R and Q are defined to respectively prop- 
agate from X[xyz]n-3 to 0[?;2;s] to 0[zss] and finally to the 
0[sss] triplet. These three matrices in succession chan- 
nel all possible local rigidity states accessible at triplet 
X[xyz]n-3 to |rs, Ts, Ts, Ts) whcu the 0[sss] solvent triplet 
is reached. Therefore, the only non-zero component in 
the direct product space after matrix Q is applied is given 
by the vector |0, s, s, s) (g) jr^, r^, r^, Tj), which is denoted 
as I/). By construction, the final vector does not change 
upon further propagation from 0[sss] to all remaining 
o[sss] solvent triplets [33 |. 

Including boundary conditions, the (internal) partition 
function is calculated as: 

Zn = {f\QRST''~^\i) V n>3 (26) 

for homogeneous peptide chains with n amino acids, and 
it involves n matrix multiplications over n triplets. The 
form of Eq. is independent of the direction used to 
propagate rigidity. By inspection the partition function 
for a tripeptide {n — 3) reduces to 

x,y,z 

The expression for Z3 highlights two subtleties about 
the simplifying assumptions invoked here that are worth 
mentioning. (1) Unlike the intra-molecular hydrogen 
bonds, the energy, C/q, for hydrogen bonding between the 
peptide and solvent is not considered to depend on the 
local peptide geometry (specified by [xyz].) (2) No pure 
entropy parameter (given by 70) is associated with the 
peptide-solvent hydrogen bonds because it has been as- 
sumed to be larger than all other pure entropies that 
characterize the four constraint types introduced above. 
As illustrated by the 2nd toy model in section UTTI con- 
straints having a pure entropy greater than all others 
that are always redundant do not contribute entropicly. 
Not allowing for entropic contributions from peptide- 
solvent hydrogen bonds implies the solvent molecules 
(aqueous solution being of primary interest) are unstruc- 
tured around the peptide. In other work, hydration ef- 
fects due to structured water around the peptide is ex- 
plicitly modeled as an additional constraint type. 

2. Generating the complete basis set 

With Eq. H26() at hand, what remains is to generate the 
complete basis set of vectors in the product space. This is 
done during the process of constructing the transfer ma- 
trices. The procedure for generating the transfer matri- 
ces, T, S, R and Q begins by considering all 8 possibilities 
for the starting product space vector. Then propagation 
to all possible next triplets is performed. Each distinct 
vector that is created defines another basis vector. For 





a 


b 


c 


d 


e 


f 


g 


h 


i 


j 


k 


1 


Sa 


1 


1 


1 


2 


2 


2 


2 


3 


3 


4 


4 


5 


5c 


2 


3 


3 


3 


4 


5 


6 


5 


6 


5 


6 


9 


'^aaa 


R 


1 or 2 


1 or 2 


1 


1 


1 


1 


1 


1 


1 


1 


1 


'Yea a 


R 


R 


2 


2 


3 


3 


3 


2 


2 


2 


2 


2 


'Yaca 


R 


R 


2 


2 


3 


3 


3 


2 


2 


2 


2 


3 


'yaac 


R 


R 


2 


2 


3 


3 


3 


2 


2 


2 


2 


4 


'Ycca 


R 


R 


R 


R 


R 


4 


4 


4 


4 


3 


3 


6 


Tcac 


R 


R 


R 


R 


R 


4 


4 


4 


4 


3 


3 


7 


'T'acc 


R 


R 


R 


R 


R 


4 


4 


4 


4 


3 


3 


8 


'Yccc 


R 


R 


R 


R 


R 


R 


5 


R 


5 


R 


5 


R 


Dim(T) 


16 


16 


28 


60 


60 


96 


140 


200 


244 


376 


436 


444 



TABLE III: Some examples of possible results that can be 
obtained after sorting the set of DCM pure entropies from 
lowest to highest, and then assigned ranks of 1 and greater 
respectively. Pure entropy values listed on the leftmost col- 
umn are rank-ordered in the various columns. The letter R 
indicates that the constraint is always redundant, and there- 
fore is always ineffective in reducing conformational entropy. 
The last row gives the dimension 13411 of the transfer matrix T 
for the particular rank ordering. Column (h) corresponds to 
the rank ordering used in Example 1231 Column (1) is similar 
to column (h) except that degeneracy is lifted from all the 
yxyz pure entropies. 

each basis vector that was not previously generated, it 
is propagated to all possible next triplets. Eventually 
the same vectors continue to be generated by recursively 
considering all vectors — indicating a complete basis set 
is formed. It is worth mentioning that the product space 
is ergodic, in the sense that starting from any vector rep- 
resenting a triplet state of the peptide chain, any other 
vector can be reached by some number of transfer matrix 
multiplications. In some cases, this number can be quite 
long, depending on the size of the transfer matrix. A pri- 
ori, the number of distinct product space vectors is not 
known because the number of local rigidity states must 
be calculated using the rigidity propagation rule. In Ta- 
ble mil the dimension, M, of the product vector space 
is listed for several choices of rank-orderings. A large 
matrix size is an indication of the long-range nature of 
rigidity that manifests itself as molecular cooperativity. 



C. DCM Results Compared to Monte Carlo 
Simulation 

The transition from a rigid a-helical state to a flexible 
coil state is characterized by helix content, which serves 
as an order parameter. The helix content is defined as 
the average fraction of amino acids in the chain having 
(j) and ip dihedral angles of a-helix geometry. The con- 
formational state of the first and last amino acids are 
explicitly taken into account. Helix content is given by 
the number of amino acids in the a-helical conformation 
divided by the number of amino acids in the chain. Ap- 
plying standard transfer matrix methods, helix content 
and specific heat are numerically calculated for any spec- 
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ified set of model parameters. Using simulated annealing 
methods, the DCM parameters were optimized to fit teg 
Monte Carlo (MC) simulation data for poly-alanincS 
of length 10 in both gas phase (no solvent) and model-i 
water solvent 1, as well as MC simulation data [s^l for 
chain lengths of 10, 15, 20 and 30 in model-water solvent 
2. 

The DCM parameters describing the backbone dof for 
a homogeneous peptide in solvent include { 14, Sa, Vc, 
6c }■ Since the amino acids located at the N- and C- 
termini are exposed to solvent differently, it is expected-, 
that the backbone parameters for the first and last amines 
acids should be modified. To keep the number of modef 
parameters to a minimum, the set of parameters givei#; 
by { V^, 6'^, V^', 6'^ } are used for both the N- and Co 
termini. Besides these 8 parameters describing dihedral 
angle characteristics along the backbone, 17 parameters 
describe hydrogen bonding. To obtain a more manage- 
able number of model parameters, many hydrogen bond 
parameters are considered to be degenerate, where it is 

assumed that 1) Ucca = Ucac = Uacc, 2) Ucaa = Uaca = 
Uaac: 3) ^cca — ^cac — ^acc: 4) Tcaa — ^aca — ^aac- This 

simplification reduces the number of hydrogen bond pa- 
rameters to 9. Taking advantage of the arbitrariness in 
absolute energies and entropies, the parameters 7aaa, f^o, 
Va and Vjj' can be preset without affecting the helix con- 
tent or the specific heat. Therefore, all backbone dof are 
fully described by thirteen (8-1-9 — 4) DCM parameters. 

Fitting the DCM to MC simulation of poly-alanine re- 
quires additional parameters to account for the flexibility 
in the alanine side-chain. The side-chain of alanine con- 
sists of one dihedral angle between the Cq, and Gp atoms 
as shown in Fig. I^s. An additional torsion constraint 
type was applied to this single side-chain dihedral angle. 
The side-chain torsion constraint is partitioned into two 
geometrical bins. Only differences in energy and pure 
entropy between the two states are required, which are 
characterized by {Vs, 5s). Since no interactions are con- 
sidered between an alanine side-chain with the backbone 
or other side-chains, the values of {Vg, 5s) have no affect 
on helix content, but do affect specific heat. Another fit- 
ting variable, Cf,, (not a model parameter) is introduced 
to represent a constant baseline for the specific heat. The 
variable, Cb, is required because the DCM is defined at 
a coarse grained level, and as such it cannot account for 
residual energy fluctuations. 

In total, 16 variables are to be determined by fitting 
to helix content and specific heat data generated by MC 
simulation ,36j^J- Although each DCM parameter has 
a physical basis, 16 variables creates the unfortunate 
problem that helix content and specific heat can be si- 
multaneously fitted with a multitude of excellent best 
fit solutions. This over parameterization can be quickly 
avoided, however. An important aspect of the DCM is 
that although many parameters have been initially gen- 
erated when the set of constraint types were defined for 
the helix-coil system; there is no size dependence. Fur- 
thermore, the number of parameters grow slowly when 
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FIG. 8: The (a) helix content and (b) specific heat from two 
Monte Carlo simulations are shown. The deviation between 
the two simulation data for chains of length 10 creates an 
intrinsic error that prevents finding a "good" fit when both 
results are treated as a single solvent type. Ignoring these 
deviations makes the meaning of goodness not sufliciently re- 
strictive, which allows too many "good" parameter solutions. 
Instead, this data is treated as two different solvents, where 
(squares, circles) represent model-water 1 and 2 respectively. 



fitting to different solvents because no solvent depen- 
dence is assumed for: 1) intra-molecular hydrogen bond 
parameters, 2) backbone dihedral angle parameters not 
depending on coil conformations, 3) side-chain dihedral 
angle parameters and 4) the specific heat baseline. 

The cohort of MC data allows 12 curves to be fitted si- 
multaneously. Superscripts g, 1 and 2 are used to respec- 
tively refer to gas phase and model-water solvents 1 and 
2. Both model- water solvent 1 and 2 refer to the MC data 
generated using the ECEPP/2 force field Initially, 
it was assumed that the model- water solvent of both sim- 
ulations could be treated identically, since both groups 
used the same force field. However, as shown in Fig. |H| 
there are sufficient differences between the chain length 
10 data to warrant treating them as different model-water 
solvents. Between the two model-water solvents, 10 sol- 
vent independent parameters are in common, and (5-1-5) 
solvent dependent parameters are required. Including the 
gas phase data requires 5 more solvent dependent param- 
eters. In total, 25 fitting parameters to 12 distinct curves 
eliminates over-fitting. 

Interestingly, it was found (from several good best fits) 
that some parameters are consistently in close proximity 
to one another. A greater fitting error was exchanged 
for a maximum reduction of free parameters |39j . Specif- 
ically, it was possible to obtain good fits when forcing 
different parameters that were found in close proximity 
to be equal. This results in demanding: \) 5], = 51 = 5^, 
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FIG. 9: Best fit to helix content for gas and model- water 1, 
obtained by simultaneous fitting 19 parameters to the cohort 
of MC data. 
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FIG. 10: Best fit to specific heat for gas and model-water 1, 
obtained by simultaneous fitting 19 parameters to the cohort 
of MC data. 



2) Vc' - 3) ' ^ V: ^ 4) 5^ 1 = and 5) U^, ^ 
U§ — as suggested by the unconstrained fits. With this 
reduction, 19 free parameters were used to fit 12 distinct 
curves simultaneously. 

The results of the simulated annealed best fits are given 
in Table IIVI for solvent independent DCM parameters, 
and Table for solvent dependent DCM parameters. 
Figuresl^landElrespectively show the fit of helix content 
and specific heat for both gas phase and model- water sol- 





aaa 


aca 


cac 


ccc 


Uxyz 


-4.637 


-2.827 


-2.339 


0.000^ 




-4.95 ± 0.39 


-3.11 ± 0.32 


-2.56 ± 0.33 


cooot 


yxyz 


2.0001" 


2.149 


2.760 


2.917 




2.000^ 


2.19 ± 0.07 


2.81 ± 0.04 


2.99 ± 0.12 



vent 1. FigureslTTlandlT^respectivelv show the fit of helix 
content and specific heat for all chain lengths in model- 
water solvent 2. Good fits to helix content were achieved 
for all 6 data sets, with the chain length of 30 in model- 
water solvent 2 showing greatest deviations in the helical 
phase. Likewise, the fits to specific heat were in remark- 
ably good quantitative agreement, considering that the 
DCM parameters are taken as temperature independent 
over a 400 K temperature range. Moreover, employing 
temperature dependent parameters appears unnecessary 
for removing systemic error, because it can be attributed 
to the over-simplified model of representing the peptide- 
solvent hydrogen bonding as a single state. Overall, the 
minimalist network rigidity model has successfully cap- 
tured the essential physics that the MC simulation does. 



Va 


5a 






Vs 


Ss 


0.000^ 


2.656 


0.0001" 


2.0001" 


1.590 


3.614 


o.ooot 


2.56 ± 0.24 


o.ooot 


2.0001" 


1.57 ± 0.13 


3.38 ± 0.15 



TABLE IV: Listing 9 solvent independent parameters. Units 
of energy is in kcal/mol, and pure entropies are dimensionless. 
The numbers with 3 digits represent a typical best-fit solution, 
which were used to generate figures |^ 1101 1111 1121 and 1141 
The numbers directly below these are the averages obtained 
over 8 typical simulated annealed fitting solutions, including 
standard deviations for statistical errors. In addition to these 
DCM parameters, the specific heat baseline was considered 
solvent independent and was given as: 0.00133 kcal/(mol K) 
with average and standard deviation given as 0.0014 ± 0.0001 
kcal/(mol K). Arbitrarily fixed parameters are indicated with 
at. 



V. DISCUSSION 

The toy models in section Hill and the helix-coil transi- 
tion in section Hvl demonstrate how generic rigidity cal- 
culations are used to construct a partition function at 
finite temperatures. Each framework in the ensemble is 
weighted by a conformational degeneracy, e"^, that de- 
pends on the type of constraints present and their specific 
placement relative to one another. Effectively, the con- 
formational degeneracy represents the free volume avail- 
able to a particular framework. It has long been recog- 
nized |40l | that free volume plays an important role in 
both phase change and relaxation in structural glasses. 
In the DCM, free volume is quantified by t(JF), which de- 
pends on the strongest independent constraints that limit 
motion. A direct connection between free volume and 
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FIG. 11: Best fit to helix content for model-water 2, obtained 
by simultaneous fitting 19 parameters to the cohort of MC 
data. The large deviation seen in the chain of length 30 is at 
an acceptable level when the trust-worthiness of the MC data 
in the helical phase itself is factored in. 
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FIG. 12: Best fit to specific heat for model-water 2, obtained 
by simultaneous fitting 19 parameters to the cohort of MC 
data. A systematic fitting error can be seen, where the DCM 
(as presented here) predicts too fast of an increase in the 
maximum peak as a function of chain length. 



the degree of mechanical flexibihty is established through 
network rigidity — an inherently long-range cooperative 
interaction. Although the importance of rigidity in the 
conceptual understanding of structural transitions is not 
new, the DCM allows the role of network rigidity at finite 
temperatures to be calculated quantitatively. 

In some respects the DCM is similar to a normal mode 
analysis in that entropies are additive over independent 
degrees of freedom. If the system of interest can be well 
approximated as a network of coupled harmonic oscilla- 
tors, then the normal modes define an appropriate set of 
independent coordinates. However, normal mode anal- 
ysis applied to the soft condensed phase is subject to 
difficulties because of an-harmonic potentials that 
limit the range of validity over the assumed harmonic 
motions. In the DCM, the "strength" of a constraint 
is inversely proportional to its free volume quantified by 
a pure entropy. An extremely weak constraint having 
a large free volume will pose no effective restrictions on 
conformational freedom. Although normal mode analy- 
sis is not intrinsically suited to deal with bonds breaking 
and forming via thermal fluctuations, a self-consistent 
phonon theory 42] has been used to account for break- 
ing and forming of hydrogen bonds in protein structure. 
Both the DCM and normal mode analysis offer approxi- 
mation schemes, but from opposite directions. For exam- 
ple, soft an-harmonic (or fiat) potentials are easier to deal 
with in the DCM because they require less geometrical 
partitioning. 

The DCM explicitly accounts for fluctuating topologi- 
cal constraints, allowing a global picture to emerge in un- 
derstanding structural self-organization. From the three 



worked examples presented, it is seen that: 1) The ef- 
fectiveness of a constraint in changing the free energy of 
the system depends on temperature and its location in 
relation to all other constraints. 2) Molecular coopera- 
tivity derives from competition between frameworks hav- 
ing different energetic and entropic contributions. More 
generally, a change in thermodynamic conditions (tem- 
perature, pressure, pH, etc) can lead to a global re- 
arrangement of optimally well placed constraints. 3) 
The most probable microstates will often correspond to 
a characteristic pattern of constraints, manifesting it- 
self as structural self-organization. For example, in the 
helix-coil transition, mechanical frameworks switch char- 
acter as some constraint types tend to break (alpha- 
helical torsion constraints and backbone hydrogen bond 
constraints) while others tend to form (coil torsion con- 
straints). This type of structural self-organization has 
been produced in athermal network rigidity models 
applied to covalent glass networks, where redundant con- 
straints were suppressed to avoid strain energy. In other 
work to be published elsewhere [s^ , hydration effects are 
included in the DCM. Structured water around a hydra- 
tion site is considered to impose another type of con- 
straint on the peptide, where it is enthalpicly favorable 
and entropicly unfavorable. Under certain thermody- 
namic conditions, cold denaturation occurs as the char- 
acter of constraint type and pattern change. 
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A. The Helix-Coil Transition 



The helix-coil transition has been studied for nearly 
50 years 0, ^'^^ ^ simple statistical mechanical 

approach, the Zimm and Bragg (ZB) and Lifson and 
Roig 's^l (LR) models are commonly used. The ZB and 
LR models share two types of parameters — referred to 
as nucleation and propagation parameters. Only two and 
three dimensional transfer matrices are required for the 
ZB and LR models respectively 46j. Without a doubt, 
the application of the LR model to explain experimental 
data has been very fruitful over the years. The question 
then arises, why use the more complicated DCM when 
the traditional LR model will do? 

The DCM clearly makes a distinction between a coop- 
erative process governing a structural transition to that 
of a non-cooperative process that happens to have a sharp 
transition. A true signature for the degree of cooperativ- 
ity is in how the transition temperature depends on chain 
length. The MC simulation data from Y. Peng et. al. 
jSTj shows a large degree of cooperativity, as the transi- 
tion temperature dramatically increases by 130 K when 
increasing chain length from 10 to 30. The DCM is able 
to capture this degree of cooperativity without requiring 
temperature or size dependent model parameters. 

For comparison, the LR-model was also fitted to 
model- water solvent 2 MC data [33. LR relates the so 
called nucleation parameter, w, and the propagation pa- 
rameter, w, to partial configurational integrals defined 
by coarse-grain sections of dihedral angle space (helical 
or coil conformations) along the backbone. These di- 
mensionless parameters are expected to be functions of 
temperature, where — fcTlnw and — fcTlnu; represent mi- 
croscopic component free energies, and are treated phe- 
nomenologically The LR parameters can be writ- 

ten in a form similar to the DCM, where v = e^^" and 
^ _ g25„g-/3v„^ Here the parameters { 6^, Sio and Vw 
} are taken as temperature independent, and fitted to 
the 4 helix content curves. Note that the v parameter is 
assumed here to be temperature independent, following 
common practice. Since the LR-model as commonly in- 
voked does not explicitly account for end effects, two ad- 
ditional parameters (not model parameters) are required 
to account for helix content baselines. 

Helix content for chain lengths 10, 15, 20, and 30 were 
individually fitted with the LR-model, each with 5 fitting 
parameters, requiring a total of 20 parameters. FigurelTSI 
shows the simulation data for chain length 10 and 30, as 
well as the best fit for each size. In addition, the predic- 
tion for helix content for chain length (30, 10) using the 
fitted parameters from chain length (10, 30) respectively 
are shown. The LR-model in its three parameter form 
does a very good job in fitting to each helix content curve. 
However, as Fig. ^| clearly shows, the fitted parameters 
obtained for one size, cannot be used to predict helix 
content of a different size. The LR parameters are inher- 
ently non-transferable because they depend on the size of 
the system. Although the sharpness of the helix content 
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FIG. 13: The solid lines through the MC data for model- 
water solvent 2 show the best 5 parameter fit for each size 
chain separately using the standard LR-model. The dashed 
line on the (left, right) corresponds to the LR prediction of 
helix content for a chain of length (30, 10) using the best fit 
parameters obtained from chain length (10, 30). 



curve is accounted for in the so called nucleation param- 
eter, the mechanism creating the cooperativity is com- 
pletely missed in this simplest three-parameter form. To 
be fair, a simultaneous fit to all four helix content curves 
was attempted using 12 parameters (4 model parame- 
ters and 8 baseline parameters). The extra LR-model 
parameter was introduced by letting v = e^^^ e~^^'" . Not 
surprising, no good simultaneous fit solutions was possi- 
ble. 

Bierzynski and Pawlowski (BP) ^isl show that the nu- 
cleation parameter is required to be a function of chain 
length due to the long range character of helix forma- 
tion. It seems unsporting to us to predict a helix with 
parameters that vary with chain length. Furthermore, 
BP demonstrate that a common implementation of the 
LR model predicts thermodynamic state functions that 
are erroneously path dependent: giving slightly different 
results depending on which end of the peptide the com- 
putation begins at, and wrong predictions when prenu- 
cleated peptides are considered. Fundamentally, the so 
called nucleation parameter is ill-defined for use in calcu- 
lating a partition function 1481. and its wide spread use 
has created misconceptions [i^. The DCM avoids these 
issues. The DCM has long range character through net- 
work rigidity, thus recourse to length dependent param- 
eters is unnecessary. 

The DCM is actually very similar to the LR model. 
Both models are based on parameters that can be de- 
rived from local microscopic free energies. The difference 
is that the DCM attempts to include non-local coopera- 
tive interactions explicitly by using generic rigidity calcu- 
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FIG. 14: The (a) transition temperature and (b) maximum 
value of the specific heat as a function of chain length for 
gas phase and model- water solvents 1 and 2. The parameters 
used in generating these curves are given in Tables HVI and IVl 



lations to account for the non-additivity of entropy. Yet 
it is possible to construct a DCM where there is very lit- 
tle entropic competition between constraint types, such 
as given in column (a) in table 11111 In this case, the 
DCM for a hclix-coil transition is identical to the gen- 
eral form of the LR model. It is worth noting that the 
two commonly used LR parameters |4^, (v,w), are only 
a subset of sixteen parameters that must be defined for 
each possible type of propagation (i.e. aac aca, and 
15 more). Lifson and Roig simplified the model consid- 
erably to solve it analytically. Unfortunately, the advan- 
tages of simplifying the mathematical form of the model 
has lead to non-transferability of parameters that have 
created many inconsistencies in the literature ^] . With 
modern computers it is no longer necessary to invoke the 
two parameter form of the LR model. The disadvantage 
of retaining the two parameter form is that the parame- 
ters become strongly dependent functions of temperature 
[alElEll and chain length 1^1,111,42,]. 



B. Solvent Effects on the Helix-Coil Transition 

The DCM parameters naturally divide into two cate- 
gories that are expected to be either weakly or strongly 
dependent on solvent conditions. Moreover, the results 
obtained by fitting the DCM to MC simulation data in- 
dicate the essential physics of the helix-coil transition 
for poly-alanine is well described by the 10 solvent inde- 
pendent parameters in table Hvl and 5 solvent dependent 
parameters given in tabled For these DCM parameters 
Fig. El shows the affect of solvent on the helix-coil tran- 



sition. Comparing gas phase and model-water solvents 1 
and 2 with each other, we see that the transition temper- 
ature and the sharpness of the transition can be substan- 
tially modified. Not surprising, the gas phase transition 
temperature is elevated with respect to model-water sol- 
vent, because alternate hydrogen bonds from backbone 
to solvent cannot replace intra hydrogen bonds as they 
break. The greater energy cost to unravel the rigid he- 
lical structure requires a higher transition temperature 
where gains in conformational entropy can begin to com- 
pensate. It is also seen that the transition temperature as 
a function of chain length for model-water solvents 1 and 
2 are very similar, as one might expect if the differences 
shown in Fig. |S1 are viewed as systematic uncertainties, 
rather than two different solvents. 

The sharpness of the transition, as characterized by 
the maximum in specific heat, is found to depend on the 
particular combination of solvent dependent parameters. 
With respect to the gas phase, from Fig. E]it is seen that 
the transition sharpens considerably for model-water sol- 
vent 1, but remains virtually the same for model- water 
solvent 2. These results correctly reproduce the observa- 
tions of the authors that generated the original MC sim- 
ulation data [31,133. Of course, model- water solvents 1 
and 2 are actually the same, albeit systematic uncertain- 
ties shown in Fig. [S] This uncertainty and the differ- 
ences seen in Fig. E] are the result of differences found 
in parameters {Uo and Vc), listed in Table IVl Therefore, 
it is easy to interpolate between the two different MC 
results within a 2-dimensional parameter space. The in- 
terpolation was done by fitting only to model-water sol- 
vent 2 data. Letting Uo range between —1.4 and —0.4 
kcal/mol, a one parameter fit to obtain the optimal Vc 
was performed, while holding Uo and all other 17 pa- 
rameters given in Tables HVl and Ivl fixed. It was found 
that the DCM model predictions smoothly change as a 
function of Uo- In Fig. I15L the helix content is shown 
for model- water solvent, but now the uncertainties in the 
parameters Uo and Vc encompass both MC simulation 
results for the chain length of 10. Chain lengths of 10, 15 
and 30 are shown in fig. 1151 which gives some indication 
of the true uncertainties in helix content for model-water 
solvent (using the ECEPP/2 force field). 

In the DCM presented here, solvent effects on the helix- 
coil transition were described well using just five param- 
eters. A better description is possible by including more 
states representing the peptide to solvent interactions. 
In other work (35] hydration constraints are included, for 
example. Furthermore, inverted transitions from coil to 
helix as temperature increases from low to high can be 
described. 



C. Molecular Cooperativity 

Admittedly, the DCM requires more effort than the 
LR model to describe the helix-coil transition. The ben- 
efit of this additional labor is that the final parameter- 
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FIG. 15: Large graph: The dashed and sohd curves show 
predictions for chain lengths 10, 15 and 30 that are obtained 
in a one-parameter best fit for Vc when setting Uo = —0.4 and 
Uo = —1.4 respectively. The circles and squares show the re- 
sults of MC simulation from Y. Peng et. al. (3^ and Okamoto 
[3^ respectively. Inset: The solid line shows the best fit value 
of Vc along the ordinate as a function of Uo along the ab- 
scissa. The (circle, square) indicates the Vc and Uo values 
used to generate the (dashed, solid) lines in the large graph. 
Due to the intrinsic uncertainty in the MC data, perhaps the 
best DCM parameter estimates are: Uo = —0.900 kcal/mol 
and Vc = —0.485 kcal/mol, which split the differences roughly 
in half. 



estimated to be ~ 40 amino acids for both gas and model- 
water solvents, also corresponding to the inflection point 
on the curves for maximum specific heat. The correla- 
tion length is quite long considering that in 1-dimension 
thermal fluctuations severely reduce the eflectiveness of 
the long-range nature of network rigidity. 

The primary motivation for introducing the DCM is 
to study flexibility and stability in proteins ^5!?i. The 
concept of a rigidity correlation length applies to any 
topology of constraints, such as found in globular pro- 
teins. The DCM can be used to directly study the af- 
fect of hydrogen bonds on protein stability, which has 
been difficult to ascertain experimentally and theoreti- 
cally. Not only does the answer depend on the speciflc 
thermodynamic conditions, but also on the particular hy- 
drogen bond in question. Stability questions are partic- 
ularly difficult to answer when there is a high degree of 
cooperativity in a molecular system. Proteins are partic- 
ularly interesting, where it has been suggested that the 
folding pathway is encoded in the hydrogen bond net- 
work In addition, mechanical stability probed by 
single-molecule force spectroscopy appears to depend on 
the kinetic stability of the hydrogen bond network [s^ — 
also a cooperative process that can be addressed within a 
DCM. More generally, the DCM describes protein folding 
as a manifestation of a structural self-organization caused 
by the topological optimization of constraint placement. 
Indeed, all model calculations presented here suggest that 
the most probable frameworks correspond to well defined 
structural units (such as secondary structure, protein do- 
mains, etc) that change character under different thermo- 
dynamic conditions. 



ization for understanding the nature of competing mi- 
croscopic interactions becomes considerably less compli- 
cated in the end. In particular, the DCM offers the poten- 
tial of having transferability of parameters. Parameter 
transferability is intimately tied to the proper summa- 
tion of component entropies, which is quantified in the 
DCM via the long-range underlying mechanical interac- 
tion between constraints. From the fitted model param- 
eters (given in Tables HVl and W \ it is seen from column 
(i) in table mil that a 244 x 244 transfer matrix was nec- 
essary to describe the MC simulation results. The large 
size of the transfer matrix is an indication of a high de- 
gree of cooperativity among the hydrogen bonding along 
the backbone. 

In exchange for the non-transferable nucleation param- 
eter to characterize the degree of cooperativity, it is char- 
acterized by a rigidity correlation length in the DCM. 
The rigidity correlation length gives an indication of how 
far away from a point of interest that perturbations in 
constraints will lead to little affect at the point of inter- 
est. It can be roughly estimated at the helix-coil tran- 
sition by locating the crossover point where the shift in 
transition temperature becomes small as chain length in- 
creases. From Fig. ^1 the rigidity correlation length is 



VI. CONCLUSION 

The DCM generalizes the T = generic rigidity calcu- 
lation to finite temperatures by quantifying constraints 
with energetic and entropic characteristics. The effec- 
tiveness of a constraint strongly depends on its type 
and where it is placed in the network in relation to all 
other constraints. Generic rigidity is then used as an un- 
derlying long-range mechanical interaction between con- 
straints, providing the mechanism for the non-additive 
property of component entropies. The DCM accounts 
for fluctuating topological patterns of constraint place- 
ments. From a computational point of view, the net- 
work rigidity calculations are easy to implement by in- 
voking fast graph-algorithms that are available in two- 
dimensions J]^ Il4| for general networks, and in three 
dimensions pla | for bond-bending networks. 

In this paper, a DCM applied to the helix-coil transi- 
tion was considered in detail and compared to the Lifson 
Roig model. Thermodynamic state functions are calcu- 
lated exactly, without recourse in using a nucleation pa- 
rameter. The helix-coil transition in peptides is special 
only in that it can be exactly solved as a 1-dimensional 
system using a transfer matrix method. Our use of the 
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DCM has been to coarse grain into the smallest num- 
ber of states necessary to describe the physics at hand. 
For example, alpha helix and coil backbone states are 
used in modeling the helix-coil transition. In this work, 
12 different thermodynamic response functions where de- 
scribed well by the DCM using 20 parameters that are 
independent of temperature and chain length. The en- 
tropic parameters indicate that the degree of cooperativ- 
ity extends over approximately 40 amino acids. 

As a practical application, the DCM may be able to 
predict helix formation in proteins with parameters de- 
rived from helix-coil transition studies. The DCM is 
readily scalable to include more types of interactions, 
where far more backbone states could have been intro- 
duced such as 3-10 helix, beta sheet, beta turn, hydrated 
or not hydrated, buried or surface exposed. If the DCM 
parameters are found to be transferable (as we expect) 
flexibility and stability studies on proteins will be far 



more feasible, because the DCM gets more physics out 
with fewer parameters. The DCM has the potential to 
gain a better understanding of these issues from a me- 
chanical point of view. More generally, the DCM gives 
a description of a coarse graining procedure to describe 
physical systems. Its applicability goes beyond biopoly- 
mers, offering a new paradigm not previously available. 
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Uo 




5c 






Gas 


-0.399 


-0.321 


3.603 


-1.344 


4.034 




-0.67 ± 0.34 


-0.35 ± 0.08 


3.58 ± 0.09 


-1.18 ± 0.22 


3.62 ± 0.25 


Solvent 1 


-1.154 


-0.321 


3.603 


-3.095 


3.523 




-1.40 ± 0.33 


-0.35 ± 0.08 


3.58 ± 0.09 


-2.73 ± 0.18 


3.55 ± 0.14 


Solvent 2 


-0.399 


-0.857 


3.603 


-3.095 


3.523 




-0.67 ± 0.34 


-0.87 ± 0.09 


3.58 ± 0.09 


-2.73 ± 0.18 


3.55 ± 0.14 



TABLE V: List of 5 solvent dependent DCM parameters per solvent type. The same units and notation are used as in table 



